PoLAPACK: parallel factorization routines with algorithmic blocking

نویسنده

Jaeyoung Choi

چکیده

LU, QR, and Cholesky factorizations are the most widely used methods for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. Most of these factorization routines are implemented with blockpartitioned algorithms in order to perform matrix-matrix operations, that is, to obtain the highest performance by maximizing reuse of data in the upper levels of memory, such as cache. Since parallel computers have di erent performance ratios of computation and communication, the optimal computational block sizes are di erent from one another to generate the maximumperformance of an algorithm. Therefore, the data matrix should be distributed with the machine speci c optimal block size before the computation. Too small or large a block size makes getting good performance on a machine nearly impossible. In such a case, getting a better performance may require a complete redistribution of the data matrix. In this paper, we present parallel LU, QR, and Cholesky factorization routines with an \algorithmic blocking" on 2-dimensional block cyclic data distribution. With the algorithmic blocking, it is possible to obtain the near optimal performance irrespective of the physical block size. The routines are implemented on the Intel Paragon and the SGI/Cray T3E and compared with the ScaLAPACK factorization routines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prototyping Parallel LAPACK using Block-Cyclic Distributed BLAS

Given an implementation of Distributed BLAS Level 3 kernels, the parallelization of dense linear algebra libraries such as LAPACK can be easily achieved. In this paper, we brieey describe the implementation and performance on the AP1000 of Distributed BLAS Level 3 for the rectangular r s block-cyclic matrix distribution. Then, the parallelization of the central matrix factorization and the trid...

متن کامل

A Comparison of Lookahead and Algorithmic Blocking Techniques for Parallel Matrix Factorization

متن کامل

Partitioning and Blocking Issues for a Parallel Incomplete Factorization

The purpose of this work is to provide a method which exploits the parallel blockwise algorithmic approach used in the framework of high performance sparse direct solvers in order to develop robust and efficient preconditioners based on a parallel incomplete factorization.

متن کامل

Parallel Genetic Algorithm Using Algorithmic Skeleton

Algorithmic skeleton has received attention as an efficient method of parallel programming in recent years. Using the method, the programmer can implement parallel programs easily. In this study, a set of efficient algorithmic skeletons is introduced for use in implementing parallel genetic algorithm (PGA).A performance modelis derived for each skeleton that makes the comparison of skeletons po...

متن کامل

Matrix factorization routines on heterogeneous architectures

In this work we consider a method for parallelizing matrix factorization algorithms on systems with Intel © Xeon Phi TM coprocessors. We provide performance results of matrix factorization routines implementing this approach and available in Intel © Math Kernel Library (Intel MKL) on the Intel © Xeon © processor line with Intel Xeon Phi coprocessors. Summary New heterogeneous systems consisting...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Concurrency and Computation: Practice and Experience

دوره 13 شماره

صفحات -

تاریخ انتشار 2001

PoLAPACK: parallel factorization routines with algorithmic blocking

نویسنده

چکیده

منابع مشابه

Prototyping Parallel LAPACK using Block-Cyclic Distributed BLAS

A Comparison of Lookahead and Algorithmic Blocking Techniques for Parallel Matrix Factorization

Partitioning and Blocking Issues for a Parallel Incomplete Factorization

Parallel Genetic Algorithm Using Algorithmic Skeleton

Matrix factorization routines on heterogeneous architectures

عنوان ژورنال:

اشتراک گذاری